Baum-welch Training for Segment-based Speech Recognition
نویسندگان
چکیده
The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm for segment-based speech recognition which addresses these issues by an innovative use of finite-state transducers. This procedure has the desirable property of not requiring initial seed models that were needed by the Viterbi training procedure we have used previously. On the PhoneBook telephone-based corpus of read, isolated words, the Baum-Welch training algorithm obtained a relative error reduction of 37% on the training set and a relative error reduction of 5% on the test set, compared to Viterbi trained models. When combined with a duration model, and more flexible segmentation network, the Baum-Welch trained models obtain an overall word error rate of 7.6%, which is the best result we have seen published for the 8,000 word task.
منابع مشابه
Comparative Study of the Baum-Welch and Viterbi Training Algorithms Applied to Read and Spontaneous Speech Recognition
In this paper we compare the performance of acoustic HMMs obtained through Viterbi training with that of acoustic HMMs obtained through the Baum-Welch algorithm. We present recognition results for discrete and continuous HMMs, for read and spontaneous speech databases, acquired at 8 and 16 kHz. We also present results for a combination of Viterbi and Baum-Welch training, intended as a trade-off...
متن کاملOptimization-Based Control for the Extended Baum-Welch Algorithm
The extended Baum-Welch (EBW) is the most popular algorithm for discriminative training of speech recognition acoustic models. The EBW algorithm is usually controlled with heuristic rules, which are used to determine the smoothing parameters of the algorithm. In this paper we propose a control method for EBW which is based on the optimization of an error measure over a small control set. The la...
متن کاملSegmentation of speech using speaker identification
This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker seg-mentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentatio...
متن کاملEfficient ML training of CDHMM parameters based on prior evolution, posterior intervention and feedback
We present an efficient maximum likelihood (ML) training procedure for Gaussian mixture continuous density hidden Markov model (CDHMM) parameters. This procedure is proposed using the concept of approximate prior evolution, posterior intervention and feedback (PEPIF). In a series of experiments for training CDHMMs for a continuous Mandarin Chinese speech recognition task, the new PEPIF procedur...
متن کاملA comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition
While Maximum Entropy (ME) based learning procedures have been successfully applied to text based natural language processing, there are only little investigations on using ME for acoustic modeling in automatic speech recognition. In this paper we show that the well known Generalized Iterative Scaling (GIS) algorithm can be used as an alternative method to discriminatively train the parameters ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003